In this paper, we aim to design an efficient real-time object detector that exceeds the YOLO series and is easily extensible for many object recognition tasks such as instance segmentation and rotated object detection. To obtain a more efficient model architecture, we explore an architecture that has compatible capacities in the backbone and neck, constructed by a basic building block that consists of large-kernel depth-wise convolutions. We further introduce soft labels when calculating matching costs in the dynamic label assignment to improve accuracy. Together with better training techniques, the resulting object detector, named RTMDet, achieves 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU, outperforming the current mainstream industrial detectors. RTMDet achieves the best parameter-accuracy trade-off with tiny/small/medium/large/extra-large model sizes for various application scenarios, and obtains new state-of-the-art performance on real-time instance segmentation and rotated object detection. We hope the experimental results can provide new insights into designing versatile real-time object detectors for many object recognition tasks. Code and models are released at https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet.
translated by 谷歌翻译
科学文献是高质量的语料库,支持大量自然语言处理(NLP)研究。但是,现有数据集围绕英语,这限制了中国科学NLP的发展。在这项工作中,我们提出了CSL,这是一个大规模的中国科学文献数据集,其中包含396K论文的标题,摘要,关键字和学术领域。据我们所知,CSL是中文中的第一个科学文档数据集。 CSL可以用作中国语料库。同样,该半结构化数据是一种自然注释,可以构成许多监督的NLP任务。基于CSL,我们提出了一个基准,以评估跨科学领域任务的模型的性能,即摘要,关键字生成和文本分类。我们分析了现有文本到文本模型在评估任务上的行为,并揭示了中国科学NLP任务的挑战,该任务为未来的研究提供了宝贵的参考。数据和代码可在https://github.com/ydli-ai/csl上找到
translated by 谷歌翻译
位置识别技术赋予了一种大满贯算法,具有消除累积错误并自身重新定位的能力。基于点云的位置识别的现有方法通常利用以激光雷达为中心的全局描述符的匹配。这些方法具有以下两个主要缺陷:当两个点云之间的距离很远时,不能执行位置识别,并且只能计算旋转角度,而无需在x和y方向上偏移。为了解决这两个问题,我们提出了一个新颖的全球描述符,该描述符围绕主要对象构建,以这种方式,描述符不再依赖于观察位置。我们分析了该方法可以完美地解决上述两个问题的理论,并在Kitti和一些极端情况下进行了许多实验,这表明我们的方法比传统方法具有明显的优势。
translated by 谷歌翻译
$ k $ -means集群是各学科的基本问题。此问题是非核解,并且标准算法仅保证找到本地最佳算法。利用[1]的本地解决方案的结构,我们提出了一种用于逃离不良局部解决方案并恢复全球解决方案(或地面真理)的一般算法框架。该框架包括迭代:(i)在本地解决方案中检测MIS指定的群集,并通过非本地操作来改进当前本地解决方案。我们讨论这些步骤的实施,并阐明所提出的框架如何从几何视角统一文献中的k $ -means算法的变体。此外,我们介绍了所提出的框架的两个自然扩展,其中初始数量的群集被遗漏。我们为我们的方法提供了理论理的理由,这是通过广泛的实验证实的。
translated by 谷歌翻译
背景和目的:胃癌已经成为全球第五次常见的癌症,早期检测胃癌对于拯救生命至关重要。胃癌的组织病理学检查是诊断胃癌的金标准。然而,计算机辅助诊断技术是挑战,以评估由于公开胃组织病理学图像数据集的稀缺而评估。方法:在本文中,公布了一种贵族公共胃组织病理学子尺寸图像数据库(GashissdB)以识别分类器的性能。具体地,包括两种类型的数据:正常和异常,总共245,196个组织案例图像。为了证明图像分类领域的不同时期的方法在GashissdB上具有差异,我们选择各种分类器进行评估。选择七种古典机器学习分类器,三个卷积神经网络分类器和新颖的基于变压器的分类器进行测试,用于测试图像分类任务。结果:本研究采用传统机器学习和深入学习方法进行了广泛的实验,以证明不同时期的方法对GashissdB具有差异。传统的机器学习实现了86.08%的最佳精度率,最低仅为41.12%。深度学习的最佳准确性达到96.47%,最低为86.21%。分类器的精度率显着变化。结论:据我们所知,它是第一个公开的胃癌组织病理学数据集,包含大量的弱监督学习的图像。我们认为Gashissdb可以吸引研究人员来探索胃癌自动诊断的新算法,这可以帮助医生和临床环境中的患者。
translated by 谷歌翻译
低秩矩阵恢复的现有结果在很大程度上专注于二次损失,这享有有利的性质,例如限制强的强凸/平滑度(RSC / RSM)以及在所有低等级矩阵上的良好调节。然而,许多有趣的问题涉及更一般,非二次损失,这不满足这些属性。对于这些问题,标准的非耦合方法,例如秩约为秩约为预定的梯度下降(A.K.A.迭代硬阈值)和毛刺蒙特罗分解可能具有差的经验性能,并且没有令人满意的理论保证了这些算法的全球和快速收敛。在本文中,我们表明,具有非二次损失的可证实低级恢复中的关键组成部分是规律性投影oracle。该Oracle限制在适当的界限集中迭代到低级矩阵,损耗功能在其上表现良好并且满足一组近似RSC / RSM条件。因此,我们分析配备有这样的甲骨文的(平均)投影的梯度方法,并证明它在全球和线性地收敛。我们的结果适用于广泛的非二次低级估计问题,包括一个比特矩阵感测/完成,个性化排名聚集,以及具有等级约束的更广泛的广义线性模型。
translated by 谷歌翻译
Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.
translated by 谷歌翻译
Noninvasive X-ray imaging of nanoscale three-dimensional objects, e.g. integrated circuits (ICs), generally requires two types of scanning: ptychographic, which is translational and returns estimates of complex electromagnetic field through ICs; and tomographic scanning, which collects complex field projections from multiple angles. Here, we present Attentional Ptycho-Tomography (APT), an approach trained to provide accurate reconstructions of ICs despite incomplete measurements, using a dramatically reduced amount of angular scanning. Training process includes regularizing priors based on typical IC patterns and the physics of X-ray propagation. We demonstrate that APT with 12-time reduced angles achieves fidelity comparable to the gold standard with the original set of angles. With the same set of reduced angles, APT also outperforms baseline reconstruction methods. In our experiments, APT achieves 108-time aggregate reduction in data acquisition and computation without compromising quality. We expect our physics-assisted machine learning framework could also be applied to other branches of nanoscale imaging.
translated by 谷歌翻译
The Abstraction and Reasoning Corpus (ARC) aims at benchmarking the performance of general artificial intelligence algorithms. The ARC's focus on broad generalization and few-shot learning has made it difficult to solve using pure machine learning. A more promising approach has been to perform program synthesis within an appropriately designed Domain Specific Language (DSL). However, these too have seen limited success. We propose Abstract Reasoning with Graph Abstractions (ARGA), a new object-centric framework that first represents images using graphs and then performs a search for a correct program in a DSL that is based on the abstracted graph space. The complexity of this combinatorial search is tamed through the use of constraint acquisition, state hashing, and Tabu search. An extensive set of experiments demonstrates the promise of ARGA in tackling some of the complicated object-centric tasks of the ARC rather efficiently, producing programs that are correct and easy to understand.
translated by 谷歌翻译
相干显微镜技术提供了跨科学和技术领域的材料的无与伦比的多尺度视图,从结构材料到量子设备,从综合电路到生物细胞。在构造更明亮的来源和高速探测器的驱动下,连贯的X射线显微镜方法(如Ptychography)有望彻底改变纳米级材料的特征。但是,相关的数据和计算需求显着增加意味着,常规方法不再足以从高速相干成像实验实时恢复样品图像。在这里,我们演示了一个工作流程,该工作流利用边缘的人工智能和高性能计算,以实现直接从检测器直接从检测器流出的X射线ptychography数据实时反演。拟议的AI支持的工作流程消除了传统的Ptychography施加的采样约束,从而使用比传统方法所需的数据较少的数据级允许低剂量成像。
translated by 谷歌翻译